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MODELING  OF  DIFFUSION  THROUGH  A  NETWORK:  A  NEW  APPROACH  USING 
CELLULAR  AUTOMATA  AND  NETWORK  SCIENCE  TECHNIQUES 


EXECUTIVE  SUMMARY 


Scientists  and  mathematicians  have  been  trying  for  centuries  to  correctly  model  behavior 
of  groups  of  people.  Behavior  can  vary  from  something  as  simple  as  what  happens  if  one 
member  of  the  group  leaves  the  group  to  something  as  complicated  as  the  effect  of  a  natural 
disaster  on  a  group’s  cohesion.  This  paper  focuses  on  the  behavior  associated  with  diffusion  or 
spread  throughout  a  group.  Exactly  what  is  spreading  throughout  the  group  is  not  defined;  it 
could  be  a  disease  or  it  could  be  a  piece  of  information.  Currently  there  are  several  mathematical 
models  that  predict  how  diffusion  through  a  group  occurs.  Some  have  been  used  to  predict 
infection  rates  in  large  populations;  others  have  been  used  to  pinpoint  individuals  that  act  as  key 
information  spreaders  such  as  a  local  gossiper.  The  goal  of  this  paper  is  to  offer  an  alternative 
method  of  modeling  this  diffusion  and  provide  some  insight  into  why  this  alternative  might  be 
more  accurate. 

Procedure 

Many  of  the  current  models  of  diffusion  assume  random  mixing.  This  would  be 
equivalent  to  putting  a  drop  of  dye  in  a  glass  of  water  and  stirring  it.  The  dye  will  diffuse 
throughout  the  entire  glass  through  random  movement  of  the  water  molecules.  Another  large 
segment  of  these  models  assume  a  detailed  knowledge  of  the  underlying  network  connections. 
For  instance  an  airline  company  might  know  exactly  which  cities  had  flights  connecting  them 
and  so  would  have  a  detailed  understanding  of  how  people  diffused  around  the  globe  by  air. 
Neither  of  these  two  assumptions,  random  mixing  and  knowledge  of  network  connections,  is 
always  valid.  Consider  the  example  of  the  spread  of  a  disease  through  a  city.  To  assume  that 
any  two  individuals  in  the  city  have  equal  chance  of  spreading  the  disease  as  any  other  two 
individuals  would  be  an  incorrect  assumption.  Certainly  people  that  go  to  the  same  office,  the 
same  grocery  store,  or  even  live  in  the  same  house  are  more  likely  to  spread  disease  than  two 
people  who  do  not  share  any  such  spaces.  To  assume  detailed  knowledge  of  the  network  of 
interactions  would  also  likely  be  an  incorrect  assumption.  Knowing  who  interacts  with  whom  is 
possible  in  a  small  office  or  school  class  room,  but  for  an  entire  city  the  possible  number  of 
connections  grows  too  large. 

Findings 

The  tool  presented  in  this  paper  uses  a  cellular  automata  (CA)  based  model  to  avoid  both 
invalid  assumptions.  Instead  of  assuming  random  mixing,  the  CA  model  assumes  random 
connections.  Since  connections  don’t  change  they  better  represent  relationships  and  interactions 
that  exist  in  reality.  Also,  only  the  general  connectedness  of  a  network  needs  to  be  known  to 
apply  random  connections.  Knowing  how  many  interactions  people  have  on  average  is  a  lot 
easier  to  find  than  knowing  all  interactions.  Thus  the  CA  model  avoids  both  assumptions, 
creating  a  better  model  in  the  process. 

We  validate  the  CA  tool  by  comparing  its  output  against  well  known  SIS  and  SIR 
models.  We  then  go  on  to  use  the  CA  tool  to  show  the  important  effects  that  may  be  masked  by 
assuming  random  mixing.  There  are  definite  variations  that  occur  as  a  result  of  the  underlying 
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connections  in  any  network.  This  tool  takes  those  into  effect  by  keeping  the  network 
connections  static  throughout  one  simulation  period  and  allowing  batch  runs  so  that  multiple 
variations  of  network  connections  may  be  tested  under  similar  conditions  and  then  the  results 
averaged.  In  the  end,  this  research  project  successfully  concludes  with  the  creation  of  new 
method  to  modeling  diffusion  through  a  network  and  justifying  its  existence  with  a  critique  on 
existing  models. 
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MODELING  OF  DIFFUSION  THROUGH  A  NETWORK:  A  NEW  APPROACH  USING 
CELLULAR  AUTOMATA  AND  NETWORK  SCIENCE  TECHNIQUES 

I.  INTRODUCTION 

Network  science  is  an  emerging  field  of  study  that  uses  the  scientific  method  to  examine 
an  array  of  networks  to  derive  a  series  of  principles  or  theorems  to  describe  the  behavior  of  those 
networks.  The  U.S.  Army  recently  became  interested  in  network  science  when  the  idea  of 
network-centric  warfare  came  about  in  the  late  1990s  with  the  publishing  of  the  book  titled  the 
same  (Alberts,  Garstka,  &  Stein,  1999).  Leading  the  way  in  the  creation  of  network-centric 
operations,  the  U.S.  Army  has  created  a  Network  Science  Center  at  our  very  own  U.S.  Military 
Academy.  Traditionally  the  focus  of  network-based  operations  has  been  on  friendly 
communication  networks  and  information  distribution  on  the  battlefield.  This  idea  stems  from 
the  theory  that  information  superiority,  just  like  other  more  familiar  terms  (air  superiority,  fire 
superiority),  is  the  next  step  the  U.S.  Military  must  work  to  accomplish  in  order  to  dominate  in 
the  information  age. 

Through  network  science  the  U.S.  Army  hopes  to  develop  methods  to  increase  the 
accuracy,  timeliness,  and  relevance  of  information  that  it  uses  to  win  the  nation’s  wars. 
However,  network  science  can  also  be  used  to  better  understand  enemy  forces.  For  instance, 
knowing  how  the  communication  networks  of  the  enemy  work  allows  one  to  target  only  the  most 
critical  components  of  that  network  to  bring  down  the  whole  system.  Also,  knowing  how  the 
enemy  communicates  provides  a  tool  that  can  be  used  to  predict  the  enemy  response.  So  not 
only  does  the  study  of  networks  afford  the  U.S.  Army  greater  information  sharing  abilities,  it 
could  also  give  a  better  understanding  of  enemy  operations  and  communications  systems. 
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%  Relevant  Information 


Figure  1  Superior  Information  Position  (From  Alberts,  Garstka,  &  Stein,  1999) 

Beyond  the  specifie  military  applieations  of  networks,  there  are  several  other  possible 
uses.  Like  the  spread  of  information,  modeling  the  spread  of  disease  through  a  population  is 
uniquely  suited  to  study  by  network  scienee.  Modeling  the  flow  of  products  throughout  a 
consumer  network  or  the  delegation  throughout  a  corporate  network  are  also  possible  areas 
where  network  science  can  offer  help.  In  the  information  age  there  are  now  so  many  different 
networks  that  network  study  can  be  applied  to  all  kinds  of  fields.  So  while  still  in  its  infancy, 
network  science  holds  the  promise  of  finding  unique  solutions  to  complex  problems  in  an 
increasingly  connected  world. 

Scientists  and  mathematicians  have  been  trying  for  centuries  to  correctly  model  behavior 
of  groups  of  people.  Behavior  can  vary  from  something  as  simple  as  what  happens  if  one 
member  of  the  group  leaves  the  group  to  something  as  complicated  as  the  effect  of  a  natural 
disaster  on  a  group’s  cohesion.  This  paper  focuses  on  the  behavior  associated  with  diffusion  or 
spread  throughout  a  group.  Exactly  what  is  spreading  throughout  the  group  is  not  defined;  it 
could  be  a  disease  or  it  could  be  a  piece  of  information.  Currently  there  are  several  mathematical 
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models  that  predict  how  diffusion  through  a  group  occurs.  Some  have  been  used  to  predict 
infection  rates  in  large  populations;  others  have  been  used  to  pinpoint  individuals  that  act  as  key 
information  spreaders  such  as  a  local  gossiper.  The  goal  of  this  paper  is  to  offer  an  alternative 
method  of  modeling  this  diffusion  and  provide  some  insight  into  why  this  alternative  might  be 
more  accurate. 

Many  of  the  current  models  of  diffusion  assume  random  mixing.  This  would  be 
equivalent  to  putting  a  drop  of  dye  in  a  glass  of  water  and  stirring  it.  The  dye  will  diffuse 
throughout  the  entire  glass  through  random  movement  of  the  water  molecules.  Another  large 
segment  of  these  models  assume  a  detailed  knowledge  of  the  underlying  network  connections. 
For  instance  an  airline  company  might  know  exactly  which  cities  had  flights  connecting  them 
and  so  would  have  a  detailed  understanding  of  how  people  diffused  around  the  globe  by  air. 
Neither  of  these  two  assumptions,  random  mixing  and  knowledge  of  network  connections,  is 
always  valid.  Consider  the  example  of  the  spread  of  a  disease  through  a  city.  To  assume  that 
any  two  individuals  in  the  city  have  equal  chance  of  spreading  the  disease  as  any  other  two 
individuals  would  be  an  incorrect  assumption.  Certainly  people  that  go  to  the  same  office,  the 
same  grocery  store,  or  even  live  in  the  same  house  are  more  likely  to  spread  disease  than  two 
people  who  do  not  share  any  such  spaces.  To  assume  detailed  knowledge  of  the  network  of 
interactions  would  also  likely  be  an  incorrect  assumption.  Knowing  who  interacts  with  whom  is 
possible  in  a  small  office  or  school  class  room,  but  for  an  entire  city  the  possible  number  of 
connections  grows  too  large. 

The  tool  presented  in  this  paper  uses  a  cellular  automata  (CA)  based  model  to  avoid  both 
invalid  assumptions.  Instead  of  assuming  random  mixing,  the  CA  model  assumes  random 
connections.  Since  connections  don’t  change  they  better  represent  relationships  and  interactions 
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that  exist  in  reality.  Also  only  the  general  eonneetedness  of  a  network  needs  to  be  known  to 
apply  random  conneetions.  Knowing  how  many  interactions  people  have  on  average  is  a  lot 
easier  to  find  than  knowing  all  interactions.  Thus  the  CA  model  avoids  both  assumptions, 
creating  a  better  model  in  the  process. 

We  validate  the  CA  tool  by  comparing  its  output  against  well  known  SIS  and  SIR 
models.  We  then  go  on  to  use  the  CA  tool  to  show  the  important  effects  that  may  be  masked  by 
assuming  random  mixing.  There  are  definite  variations  that  occur  as  a  result  of  the  underlying 
connections  in  any  network.  This  tool  takes  those  into  effect  by  keeping  the  network 
connections  static  throughout  one  simulation  period  and  allowing  batch  runs  so  that  multiple 
variations  of  network  connections  may  be  tested  under  similar  conditions  and  then  the  results 
averaged.  In  the  end,  this  research  project  successfully  concludes  with  the  creation  of  new 
method  to  modeling  diffusion  through  a  network  and  justifying  its  existence  with  a  critique  on 
existing  models. 
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MODELING  OF  DIFFUSION  THROUGH  A  NETWORK:  A  NEW  APPROACH  USING 
CELLULAR  AUTOMATA  AND  NETWORK  SCIENCE  TECHNIQUES 

II.  BACKGROUND 


A.  Graph  theory 

Mathematics  has  provided  a  number  of  different  models  to  describe  diffusion  through  a 
network.  Since  networks  are  generally  constructed  either  to  model  or  facilitate  diffusion,  being 
able  to  describe  this  diffusion  in  a  numerical  expression  is  of  great  importance.  We  begin  with  a 
review  of  some  basic  ideas  in  graph  theory. 

A  graph  is  a  collection  of  vertices  (or  nodes)  and  edges  that  connect  pairs  of  vertices 
known  as  neighbors.  A  graph  may  be  directed,  meaning  flow  from  one  vertex  to  another  is 
restricted  to  only  one  direction,  or  it  may  be  undirected,  in  which  case  flow  is  unrestricted  in 
direction  between  nodes.  Nodes  have  a  number  of  attributes  that  conveniently  describe  their 
place  within  a  graph  and  relative  number  of  neighbors.  The  degree  of  a  node  is  equal  to  the 
number  of  neighbors  that  node  has.  Nodes  also  typically  have  some  kind  of  state  that  depends 
on  the  type  of  network  being  modeled.  A  disease-tracking  graph  may  have  nodes  with  states 
such  as:  infected,  immune,  not  infected,  recovered  from  infection.  The  undirected  graph  in 
Figure  2  has  four  nodes,  with  node  1  having  a  degree  of  three  while  node  2  has  a  degree  of  one. 
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B.  Existing  Diffusion  Models 

1.  Threshold  v.  Independent  Cascade  Models^ 

Two  general  types  of  diffusion  models  are  the  threshold  and  independent  cascade 
models.  These  models  are  named  for  the  eonditions  by  which  a  node  changes  state.  The  first 
model  gives  each  node  its  own  threshold  which  must  be  reached  before  it  will  change  state.  The 
independent  cascade  model  gives  each  node  a  single  chance  to  change  the  state  of  neighboring 
nodes;  each  node  is  acting  independently  of  the  others. 

In  the  threshold  model  each  neighbor  may  be  given  a  certain  weight  with  which  it  will 
affect  its  neighbor(s).  For  a  particular  node  to  change  state  the  combined  weight  of  that  node’s 
neighbors  acting  on  it  must  meet  the  threshold,  as  specified  in  Inequality  1 . 


degree  of  w 

Inequality  1  Threshold  Weights  (From  Kempe,  Kleinherg,  &  Tardos,  2003) 

In  this  summation,  the  weights  (b)  of  the  neighbors  (v)  of  w  are  added  together.  If  they  are  less 
than  or  equal  to  the  threshold  9  of  the  node  w  then  the  node  will  not  change  state.  This 
computation  is  done  for  each  node  at  each  time  step  (t). 

The  second  type  of  model,  the  independent  cascade  model,  simply  gives  each  of  the 
neighbors  of  v  some  independent  probability  of  affecting  the  state  of  v.  For  example  if  a 
neighbor  of  w  called  v  becomes  active  at  time  t,  it  is  given  a  single  chance  to  activate  w  with  a 
random  probability  p(v).  If  v  succeeds,  then  w  becomes  active  at  time  t+\.  If  there  are  multiple 


1  Kempe,  Kleinberg,  &  Tardos,  2003,  138 


neighbors  of  w  that  become  active  at  time  t  then  they  are  all  given  a  chance  to  activate  rv  in  a 
random  order. 

2,  Bass  Model^ 

One  of  the  earlier  models  of  diffusion  is  known  as  the  Bass  Model  (Jackson,  2008).  This 
model  does  not  involve  network  science  but  is  a  commonly  used  and  well  known  model  of 
diffusion.  The  model  depends  on  two  parameters:  the  first  is  the  rate  (p)  at  which  nodes 
spontaneously  become  active  and  the  second  is  the  rate  {q)  at  which  nodes  become  active  due  to 
the  activity  of  neighboring  nodes.  For  simplicity  it  can  be  assumed  that  there  are  only  two  states, 
active  and  inactive.  The  model  predicts  F(t),  the  number  of  active  nodes  at  time  t  by  the 
following  equation: 

1  _  p-(p+q)t 

F(t)  = - 

1  +  ^p-(p+q)t 

V 

Equation  2  Bass  Model  (From  Jackson,  2008) 

In  this  equation,  p  is  the  rate  of  spontaneous  activity  and  q  is  the  rate  of  activation  due  to 
surrounding  activity.  Using  this  model  the  process  of  calculating  an  approximation  for  the  total 
number  of  active  nodes  at  any  time  is  quite  straight  forward. 

3.  SIS  &  SIR  ModeP 

Another  diffusion  model  known  as  the  SIS  model  (Susceptible,  Infected,  and  Susceptible) 
is  more  commonly  associated  with  modeling  the  spread  of  disease.  This  model  is  based  on  a 

2  Jackson,  2008,  187 

3  Ibid.,  196. 
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node  being  either  infeeted  or  not  infeeted  but  susceptible  to  further  infection.  This  model  differs 
from  the  SIR  model  (Susceptible,  Infected,  and  Recovered)  in  that  the  SIR  model  incorporates  a 
state  in  which  a  node  is  no  longer  susceptible  to  infection  either  through  immunity  or  death. 
Important  to  both  of  these  models  is  the  fact  that  random  mixing  of  individuals  is  assumed,  so 
that  there  are  no  predefined  pathways  for  disease  spread.  The  SIS  model  is  the  simpler  of  the 
two  since  all  nodes  return  to  their  original  state  after  infection.  Its  measure  of  the  average 
infection  rate  is  rather  simple  to  calculate  and  depends  on  two  factors.  The  first  is  the  degree 
distribution  P,  and  the  second  is  the  fraction  of  individuals  of  degree  d  who  are  infected,  p(d). 
Then,  the  average  infection  rate  of  the  population  is 

P  =  'Zd=0,l,2...PWp(d). 

Equation  3  SIS  Model  (From  Jackson,  2008) 


The  SIR  model,  of  which  the  Kermack-McKendrick  is  a  common  type,  is  a  little  more 
complex.  It  consists  of  a  system  of  nonlinear  ordinary  differential  equations,  one  for  each  state  a 
node  could  be  in:  S  (susceptible),  I  (infected/infectious),  and  R  (removed) 

dS(t) 

=  bSit)  - 

dl{t) 

— —  =  u/(t)5(t)  -  c/(t) 


dR(t) 

dt 


=  c/(t) 


Equation  4  SIR  Equations  (From  Weisstein) 


where  v  is  the  infection  rate,  b  is  the  birth  rate,  and  c  is  the  immunity  rate.  The  behavior  of  this 
model  depends  upon  the  ratio  of  the  initially  infected  (S)  times  the  infection  rate  (P)  to  the 
recovery  rate  (y). 
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Equation  5  SIR  Ratio  (From  Weisstein) 


If  this  ratio  is  greater  than  1,  then  eaeh  person  who  is  infected  will  infect  more  than  one  other 
person  and  thus  the  disease  will  spread.  If  it  is  less  than  1 ,  then  the  disease  will  die  out  quickly. 


C.  Cellular  Automata 

While  the  concept  of  cellular  automata  has  no  immediate  relationship  with  diffusion 
through  a  network  it  does  have  some  specific  applications  to  our  particular  objectives.  A  cellular 
automata  (CA)  consists  of  a  collection  of  cells  that  change  over  discrete  time  intervals  according 
to  a  system  of  rules.  These  rules  are  applied  at  each  time  step  to  decide  the  state  at  the  next  time 
step.  Each  time  step  is  known  as  a  generation.  Often  changes  in  the  cells  are  depicted  in  the 
form  of  coloration,  in  its  simplest  form  this  may  be  a  change  from  black  to  white  or  vice-a-versa. 
Cells  may  be  in  a  single  line  representing  a  one-dimensional  environment,  or  they  may  be  placed 
in  a  grid  to  simulate  a  two-dimensional  situation.  There  are  no  universal  generic  rules  for 
cellular  automata,  but  often  particular  rules  are  applied  in  a  manner  similar  to  threshold  models 
described  in  the  section  1.  The  state  of  a  cell  in  a  future  generation  often  depends  upon  the  cell’s 
current  state  and  the  count  of  states  of  neighboring  cells.  In  CA,  neighboring  cells  are  generally 
defined  as  those  cells  directly  adjacent  to  the  cell  in  question.  For  instance  in  a  two  state  system 
a  cell  may  enter  state  2  in  the  next  generation  if  in  the  previous  generation  the  cell  was  in  state  1 
and  had  two  neighbors  in  state  2. 
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Gen;  k 


Gen:  k+1 


D.  Objective  and  Research  Question 

The  objective  of  this  paper  is  to  describe  the  design  and  implementation  of  a  new  method 
for  modeling  spread  of  information.  This  method  will  seek  to  combine  elements  of  both  network 
graphs  and  geographically-based  models,  building  upon  the  theory  of  cellular  automata.  CA  is 
useful  because  the  underlying  network  structure  that  connects  cells  and  the  time  step  logic 
enables  easy  modeling  of  large  or  complex  situations. 

We  use  a  custom-made  form  of  CA,  and  interlace  some  ideas  from  network  graphs  such 
as  “wormhole”  links  that  connect  cells  that  are  not  adjacent.  We  also  introduce  some  degree  of 
randomness  into  the  system  using  probability  distributions  to  determine  whether  information  will 
flow  from  a  cell  to  its  neighbor. 

We  verify  the  validity  of  the  model  by  comparing  it  to  current  models  already  mentioned 
in  the  above  before  using  it  to  draw  other  conclusions  about  the  diffusion  in  a  network.  The  final 
objective  is  the  formulation  of  conclusions  about  diffusion  through  networks,  which  can  be  used 
to  evaluate  current  models  with  data  generated  by  the  CA  tool. 
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E.  Justification  of  Study 

In  the  Army,  or  indeed  in  any  organization,  eommunieation  is  an  aspeet  of  leadership  that 
is  highly  valued.  Mueh  researeh  has  been  done  and  money  invested  into  the  study  of  information 
flow  and  eommunieation  between  individuals  and  within  a  large  group.  Espeeially  in  the  realm 
of  network  seienee,  the  study  of  entities  interaeting  as  part  of  a  larger  group  has  unearthed 
several  useful  models  or  methods  for  information  spread.  While  these  models  often  produee 
useful  and  interesting  results,  they  can  be  difficult  to  construct  based  on  incomplete  knowledge 
of  the  edges  in  the  network  graph.  Strictly  geography-based  models  used  for  modeling  spread 
simply  by  proximity  of  the  entities  to  one  another  fail  to  account  for  long-distance 
communications  that  allow  some  entities  to  share  information  at  a  distance. 

Outside  of  network  science  there  are  many  models  that  don’t  take  into  account  any 
underlying  network  when  modeling  diffusion.  It  is  one  of  the  goals  of  this  paper  to  underscore 
the  importance  of  recognizing  the  underlying  graph  of  a  network  or  at  the  very  least  taking  into 
account  some  basic  properties  of  that  graph.  One  of  the  more  visually  appealing  as  well  as  easily 
implemented  methods  for  modeling  spread  geographically  is  through  the  use  of  CA.  It  is  for 
these  reasons  that  a  combination  of  CA  and  network  theory  will  create  a  unique  and  useful  model 
of  diffusion. 
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MODELING  OF  DIFFUSION  THROUGH  A  NETWORK:  A  NEW  APPROACH  USING 
CELLULAR  AUTOMATA  AND  NETWORK  SCIENCE  TECHNIQUES 


III.  MODELING  TOOLS 

A  major  goal  of  this  research  project  and  one  that  took  a  vast  amount  of  the  time 
available  to  complete  it  was  the  creation  of  a  modeling  tool  from  which  to  gather  data.  The 
initial  idea  was  to  create  a  CA-based  computer  simulator  that  was  customizable  and  easy  to  use. 
Customization  was  easy  to  accomplish  since  the  tool  was  built  from  the  ground  up  and  the  ease 
of  use  came  mostly  out  of  the  GUI  interface.  Once  the  CA  tool  had  been  created  and  modified  to 
an  acceptable  point  of  operation,  we  created  another  Excel-based  tool.  This  experimental  tool 
was  based  off  of  the  recently  constructed  CA  tool  but  instead  of  discrete  states  of  0  and  1  in  CA, 
its  computations  were  based  on  probabilities. 


A.  Cellular  Automata  Tool 

The  CA  tool,  pictured  in  Figure  4,  is  designed  to  be  easy  to  use  and  highly  customizable. 
It  allows  the  user  to  modify  the  size  of  the  CA  universe  and  the  number  of  generations  to  run.  It 
can  run  batch  runs  of  various  scenarios  to  allow  easy  computation  of  averages.  The  CA  rules  can 
be  modified  to  model  different  scenarios  such  as  SIS  or  SIR  scenarios.  Even  the  connections 
between  cells  can  be  modified  using  the  available  adjacency  matrix  or  the  default  settings  of  left 
and  right  adjacent  neighbor  connections  may  be  used. 
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Figure  4  Cellular  Automata  Tool 


In  the  visual  window  each  successive  generation  is  displayed  in  the  row  below.  Likewise 
the  state  count  window  displays  the  total  number  of  cells  in  each  state  in  each  generation  by  row. 
In  the  adjacency  table,  filled  in  squares  represent  connections  between  cells  in  the  visual  window 
and  checking/unchecking  the  symmetric  box  can  enforce  symmetry  in  the  adjacency  matrix  thus 
making  either  a  directed  or  undirected  graph.  Depending  on  the  scenario,  the  adjacency  table 
may  be  either  in  a  default  configuration  with  left  and  right  neighbor  connections  only,  or  may  be 
populated  with  a  random  fill  of  connections  according  to  the  level  of  connectedness  desired. 
Also,  the  default  configuration  may  have  additional  random  connections  according  to  the  desired 
level  of  connectedness.  The  entire  tool  was  built  from  scratch  within  Java,  allowing  for 
complete  customization. 
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B.  Experimental  Excel  Tool 


The  Excel  model  was  created  to  explore  specific  questions  inspired  by  the  CA  tool.  Excel 
is  especially  well-suited  for  CA  modeling  because  it  can  easily  be  used  to  perform  repeated 
operations  over  a  finite  and  discrete  amount  of  time.  This  model  focused  on  probabilities  of 
infection  rather  than  discrete  cell  states  of  infected  or  uninfected,  in  order  to  examine  the 
approximate  long-term  behavior  of  the  system.  Eigure  5  below  depicts  an  example  simulation  of 
a  population  of  ten  individuals.  The  initial  infection  rate  was  20%  and  the  chance  of 
transmission  and  recovery  were  20%  and  30%  respectively.  The  adjacency  matrix  in  the  upper 
right  comer  of  the  figure  operates  similarly  to  the  matrix  in  the  CA  tool.  The  colors  of  the 
columns  indicate  their  infection  rate  with  red  being  more  likely  to  become  infected  and  green 
less  likely. 
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Figure  5  Excel  Model  of  Infection 


16 


The  entire  model  was  built  using  formulas  so  that  the  initial  infeetion  rate,  reeovery  and 
transmission  rates,  as  well  as  the  pereent  eonneetedness  of  the  network  eould  easily  be  ehanged 
to  affeet  the  whole  model.  There  is  no  bateh  run  ability  for  this  model  so  eomputing  overall 
averages  for  a  given  seenario  is  not  possible.  There  were  some  interesting  results,  however, 
whieh  will  be  examined  in  the  findings  seetion  of  this  paper. 
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MODELING  OF  DIFFUSION  THROUGH  A  NETWORK:  A  NEW  APPROACH  USING 
CELLULAR  AUTOMATA  AND  NETWORK  SCIENCE  TECHNIQUES 

IV.  Results 


A.  Validation 

We  begin  our  discussion  of  results  with  a  validation  of  the  tool  being  used  to  model 
diffusion.  We  will  use  the  SIS  and  SIR  models  described  above  and  compare  the  output  to  data 
from  computer  simulations  using  our  CA  tool.  This  simplest  of  the  two  models  (SIS)  can 
generally  be  described  by  the  figure  below: 


0  se  40  60  S0  100  120  140 

T imestep 

Figure  6  Simulation  and  Solutions  to  SIS  Model  (From  Ediger,  2010) 


This  figure  depicts  a  simulation  average,  a  numerical  solution,  and  an  analytical  solution  to  the 
SIS  model  measuring  the  number  of  infected  nodes  over  time.  As  you  can  see,  the  curve  exhibits 
a  logistical  pattern.  Similar  results  were  obtained  from  an  SIS  model  simulation  using  our 
computer  based  CA: 
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Figure  7  CA  Simulation  of  SIS  System 


This  simulation  was  created  using  two  state  cells  (0  &  1)  with  neighboring  cell  connections  and 
additional  random  connections  on  the  order  of  0.5%  of  the  total  possible  connections  and  a 
universe  of  100  cells.  The  simulation  was  run  for  50  generations  but  is  cut  off  in  this  graph  as 
there  was  no  change  in  cell  count  after  8  generations.  SIS  rules  were  in  place  that  allowed  an 
infected  cell  to  transfer  its  infection  to  any  connected  cells  and  then  become  susceptible  in 
subsequent  time  steps. 

Both  the  well  known  SIS  model  and  the  simulation  run  using  our  CA  tool  give  similar 
logistical  patterns  of  infection  growth  rates.  These  curves  correctly  model  an  SIS  sytem  as  the 
infection  starts  off  in  exponential  growth  initially  and  then  slowly  levels  off  as  the  infection  rate 
nears  the  carrying  capacity  of  the  disease.  The  similarity  of  these  two  curves  indicates  that  our 
CA  tool  is  a  valid  method  of  modeling  diseases  in  an  SIS  system. 
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Next,  the  tool  was  validated  by  modeling  a  more  complicated  model,  the  SIR  model. 
Curves  for  a  SIR  system  as  described  in  the  previous  section  generally  appear  similar  to  the 
curve  below: 


0  2  4  6  8  10  12  14  16 
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Figure  8  Deterministic  Solution  of  SIR  Model  (From  Chestnut,  2010) 

This  image  was  generated  from  a  deterministic  solution  to  the  three  differential  equations  that 

make  up  the  SIR  model  using  MATLAB,  starting  with  1  infected  individual  and  over  700 
susceptible  individuals.  Below  are  the  results  of  an  SIR  simulation  using  the  CA  tool: 


Figure  9  CA  Simulation  of  SIR  System 
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This  simulation  was  created  using  three  states  of  eells  with  neighboring  cell  eonneetions  and  a 
universe  of  50  eells.  SIR  rules  were  in  plaee  that  allowed  a  suseeptible  eell  to  beeome  infeeted 
by  any  neighboring  eell,  remaining  infected  for  a  generation,  and  then  transforming  into  a 
resistant  eell  for  the  remainder  of  the  time  period. 

The  deterministic  solution  and  the  CA  simulation  are  similar  in  appearance.  Both  figures 
eorreetly  model  the  initial  rapid  inerease  in  infeetions  resulting  deerease  in  suseeptibility  and 
follow  on  inerease  in  resistanee  to  infeetion.  The  infeetion  rate  slows  and  drops  off  as  the 
number  of  resistant  individuals/eells  beeomes  greater  than  the  amount  of  suseeptible  ones.  Both 
the  infeetion  rate  and  suseeptible  rate  approaeh  zero  as  all  individuals/cells  beeome  resistant  to 
the  infection. 

One  important  aspect  to  note  about  these  two  models  is  that  they  both  are  based  on  the 
assumption  of  random  mixing.  CA  does  not  allow  for  random  mixing  to  oeeur.  However,  the 
CA  universe  is  set  at  the  beginning  of  a  run  and  does  not  ehange  throughout  the  period  of  a 
simulation.  This  is  an  important  distinetion  to  make  as  random  mixing  eannot  always  be 
assumed.  In  reality  it  cannot  be  expeeted  that  an  individual  will  spread  an  infeetion  to  an  area  to 
whieh  that  individual  has  never  been  to.  Disease  is  spread  along  ehannels  of  eontaets  and  so  the 
spread  of  a  disease  is  best  modeled  using  an  underlying  graph  of  the  eonneetions  between 
people. 
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B.  Findings 


Spread  of  Information  through  a  Community 

Obviously  it  is  difficult  if  not  impossible  to  know  the  underlying  graph  of  a  large  network 
espeeially  if  the  nodes  of  that  network  are  as  uneontrollable  as  people  ean  be.  It  is  unrealistie  to 
expeet  a  modeler  to  know  all  of  the  eonneetions  of  sueh  a  graph  but  one  eould  know  the  average 
number  of  eonneetions  between  nodes.  The  CA  tool  used  in  this  paper  allows  for  the 
eustomization  of  the  eonneetedness  of  the  graph  by  randomly  apportioning  a  pereentage  of  the 
total  possible  eonneetions.  This  is  not  the  same  as  random  mixing:  onee  the  simulation  begins 
the  eonneetions  do  not  ehange.  Depending  on  the  real  life  situation  being  modeled  this  may  or 
may  not  be  a  more  accurate  picture  of  what  is  being  modeled. 

One  of  the  situations  the  CA  tool  was  used  to  model  that  showed  the  importanee  of  the 
topology,  was  in  the  modeling  of  information  flow  from  person  to  person  through  a  eommunity. 
Using  a  few  assumptions  to  ereate  a  very  simple  seenario,  we  modeled  the  flow  of  information 
with  a  100%  chanee  of  transmittanee  between  100  conneeted  individuals.  We  then  varied  the 
number  of  eonneetions  and  the  distribution  of  those  eonneetions  to  examine  how  the  topology  of 
a  network  affeeted  the  diffusion. 
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Information  Flowthrough  a  Community 
Neighbors  Only 


Uninformed 

-Informed 


Generations 


Figure  10  Information  Flow  through  a  Community  Neighbors  Only 


The  very  uninteresting  figure  above  shows  the  fiow  of  information  from  one  individual 
through  an  entire  community  where  each  individual  may  transfer  information  to  their  two 
immediately  adjacent  neighboring  individuals.  This  would  best  be  envisioned  as  a  circle  of  100 
family  houses  where  each  family  only  communicated  to  the  families  directly  to  the  right  and  left 
of  their  own  house.  This  is  not  a  very  realistic  model  but  it  provides  a  base  line  to  compare 
things  to. 

The  next  simulation  eliminated  the  rigid  left  and  right  neighborhood  structure  and 
replaced  it  with  a  random  assortment  of  connections  that  consisted  of  2%  of  the  total  possible 
number  of  connections  amongst  the  individual  families.  Since  there  are  100  families  there  are 
10,000  possible  connections,  thus  2%  of  the  possible  connections  represent  200  connections 
which  is  the  same  number  of  connections  in  the  simulation  just  prior  with  left  and  right 
neighbors.  In  this  simulation  the  number  of  connections  is  the  same  as  the  one  before  but  the 
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placement  of  those  connections  is  not.  The  average  result  of  this  model  of  information  flow 
through  a  community  (for  500  runs)  is  depicted  in  Figure  9  below. 

Information  Flowthrough  a  Community 
2%  Connectedness 


Uninformed 

Informed 


Figure  11  Information  Flow  through  a  Community  2%  Connectedness 

In  this  figure  one  does  not  see  the  same  linear  growth  that  occurred  in  the  scenario 
depicted  by  Figure  8.  Additionally  notice  that  the  number  of  informed  families  does  not  nor  will 
it  reach  100%  of  the  total.  This  means  that  the  entire  community  does  not  always  receive  the 
information  that  is  being  passed  along.  Sometimes  there  are  families  that  are  isolated  from  the 
rest  of  the  community  and  thus  cannot  receive  the  information  from  anyone  else  in  the 
community.  Also  the  time  scale  is  much  shorter  in  this  scenario.  Whereas  in  the  previous 
scenario  it  took  50-51  generations  for  the  entire  community  to  become  informed,  in  this  scenario 
it  took  only  eight  generations  for  almost  all  of  the  community  to  become  informed.  Given  these 
results,  it  appears  that  randomizing  the  connections  between  families  increased  the  rate  of 
transference  but  reduced  the  possibility  of  the  entire  community  becoming  informed. 


Generations 
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Reducing  the  number  of  connections  to  1%  of  the  total  possible  connections  significantly 
reduced  the  chance  of  the  entire  community  from  becoming  informed  but  did  not  increase  the 
rate  at  which  transference  occurred.  In  Figure  12  below  the  results  of  an  averaged  500  batch 
runs  of  a  community  randomly  connected  with  1%  of  the  total  possible  connections  are  depicted. 


Information  Flowthrough  a  Community 
1%  Connectedness 


Generations 


Figure  12  Information  Flow  through  a  Community  1%  Connectedness 


The  smaller  number  of  connections  increased  the  likelihood  that  some  families  would  be  entirely 
disconnected  from  the  group  of  the  community  that  was  connected  to  the  one  informed  family. 
The  rate  at  which  the  community  reached  maximum  information  saturation  occurred  after  9 
generations,  which  was  not  really  different  from  the  rate  at  which  the  2%  connectedness 
simulation  reached  saturation. 
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Disease  Transmission  and  Expected  Infection  Rates 


Another  scenario  tested  was  a  threshold  based  model  that  was  a  variant  of  the  SIS  model. 
Like  the  SIS  model,  each  cell  had  two  states,  infected  and  susceptible,  but  there  were  also 
transmission  and  recovery  rates  associated  with  each  transformation  of  cell  state.  We  initially 
choose  a  transmission  rate  of  35%  and  a  recovery  rate  of  50%  because  they  gave  significant 
variability  in  the  results.  Figure  13  below  depicts  simulations  using  left  and  right  neighbor 
connections,  and  additional  0.0%,  0.50%,  1.0%  or  1.50%  connections  determined  at  random. 
Each  simulation  is  given  its  own  color  with  the  number  of  infected  individuals  starting  at  10% 
and  the  number  susceptible  at  90%  with  a  total  population  of  100  cells. 


35%  Transmission  Rate  with  Various 
Connectedness 
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Figure  13  35%  Transmission  Rate  &  Various  Connectedness 


From  these  results  it  appears  that  higher  connectedness  leads  to  higher  rates  of  population 
infection  rates.  What  is  surprising  by  these  results  is  how  big  of  a  difference  the  added 
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connections  made  in  the  infection  rates.  With  just  neighbor  connections  (0.0%)  the  infection 
died  off  fairly  quickly  since  recovery  was  more  likely  than  transmission.  But  added  connections 
quickly  overcame  the  difference  in  transmission  and  recovery  rates  allowing  the  infection  to 
have  a  long  term  presence  in  the  population.  Traditional  models  predict  that  with  random  mixing 
a  disease  will  die  off  if  it  has  a  significantly  smaller  transmission  rate  compared  to  recovery  rate, 
see  equation  4  (SIR  Ratio).  But  the  CA  simulation  shows  that  the  infection  rate  is  highly 
dependent  upon  the  connectedness  of  the  network;  in  fact,  a  disease  may  still  persist  in  a 
population  with  a  transmission  rate  smaller  than  a  recovery  rate. 


When  reversed,  transmission  rate  of  50%  and  recovery  rate  of  35%,  the  expected 
persistant  infection  rates  emerge  even  with  just  left  and  right  neighbor  connections.  The 
additional  random  connections  pushes  the  infection  rate  higher  than  50%.  These  two  scenarios 
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show  the  immense  effect  that  the  connectedness  of  a  network  can  have  on  the  spread  of  a 


pathogen  through  a  community. 


Excel  Model  Examples 


Figure  5  (below  again  for  reference)  of  the  Excel  Model  of  Infection  showed  clearly  that 


some  individuals  were  more  likely  to  become  infected  than  others,  even  when  all  individuals  in  a 
population  initial  have  the  same  likelihood  of  infection. 
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Clearly  from  this  figure  it  is  apparent  that  some  individuals  (9  &  10)  are  far  more  likely 
to  become  infected  than  others  (4  &  1).  In  fact,  you  can  see  in  the  figure  below  that  the 
distribution  of  infection  likelihood  is  dramatically  different  for  almost  every  member  of  this 


population. 


The  large  ballooning  of  the  infection  rates  for  various  cells  shows  exactly  how  much 


the  underlying  connections  can  affect  individual  infection  rates  even  with  the  same  starting 


infection  rate. 
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Probability  of  Infection 


Generations 


Figure  15  Individual  Infection  Rates 


29 


The  graph  of  the  overall  average  infeetion  rate  shows  none  of  the  variability  that  Figure 
16  above  shows.  While  being  able  to  eompute  average  infection  rates  for  a  population  may  be 
useful  for  some  statistical  inferences,  it  really  serves  only  to  mask  the  large  amount  of  variability 
that  exists  in  the  population.  This  variability  is  important  as  we  have  seen  and  can  result  in 
second  and  third  order  effects  that  would  not  be  noticed  with  vast  amounts  of  averaging. 
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MODELING  OF  DIFFUSION  THROUGH  A  NETWORK:  A  NEW  APPROACH  USING 
CELLULAR  AUTOMATA  AND  NETWORK  SCIENCE  TECHNIQUES 

V.  CONCLUSION  AND  RECOMMENDATIONS 

From  the  information  flow  scenario  and  disease  transmission  scenario  we  can  glean  two 
important  lessons.  The  first  is  that  the  distribution  of  the  connections  in  a  network  can  have  a 
substantial  impact  on  the  diffusion  of  something,  say  information,  through  a  network.  The 
second  is  that  the  connectedness  of  a  network  may  also  have  a  large  effect  on  the  spread  of 
something,  say  a  disease,  through  a  network.  These  two  facts  highlight  the  importance  of  the 
underlying  graph  that  represents  a  network  when  modeling  movement  of  an  impulse  through  that 
network.  This  impulse  may  be  a  message  or  it  may  be  a  virus  but  it  is  clear  from  these 
experiments  that  random  mixing  is  not  always  the  best  assumption. 

There  are  certainly  times  when  random  mixing  is  a  good  assumption  to  make.  Within  a 
house  for  example  it  could  reasonably  be  expected  that  individuals  would  moving  about  the 
house  fairly  often  and  interacting  with  all  individuals  in  the  house  at  some  point  or  another.  This 
would  not  be  true  on  a  national  scale  however  because  you  would  not  expect  a  person  on  one 
side  of  the  country  to  interact  often  with  individuals  on  another  side  of  the  country.  In  such  a 
situation  it  would  be  best  modeled  using  a  simulation  that  took  into  account  something  about  the 
underlying  distribution  of  connections  between  people.  This  paper  should  serve  to  illustrate  the 
idea  that  random  mixing  models  may  not  work  well  for  many  situations.  Additionally  this  paper 
presents  an  alternate  way  to  simulate  spread  throughout  a  network  if  something  about  the 
underlying  network  graph  is  known. 
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A.  Recommendations  for  further  study 


There  are  many  useful  tools  that  can  be  derived  from  further  research  into  this  problem  of 
modeling  diffusion.  First,  in  order  to  refine  the  current  modeling  using  CA,  it  would  be  useful  to 
model  real  life  networks  or  at  least  random  networks  that  are  similar  to  real  ones.  There  is 
currently  a  lot  of  research  in  the  area  of  pseudo-random  graph  generation  and  this  project  would 
be  more  complete  if  it  incorporated  some  of  the  findings  of  that  research.  The  random  networks 
that  were  generated  in  this  project  are  completely  random  and  do  not  properly  simulate  the  social 
networks  one  might  find  in  a  community  of  people.  For  example,  random  small-world  networks 
would  better  model  a  social  network  but  that  would  also  require  substantially  more  coding  and 
was  not  able  to  be  incorporated  into  this  project. 

A  second  recommendation  is  further  research  into  the  development  of  theoretical  models 
of  diffusions.  This  paper  offers  a  simulation  tool  which  can  model  diffusion  but  it  does  not  lend 
itself  nicely  to  strictly  theoretical  mathematical  models.  A  single  formula  or  series  of  equations 
to  determine  long  term  infection  rates  would  be  considerably  more  useful  than  a  computer 
simulation.  Additional  research  into  this  project  area  would  first  focus  on  obtaining  a  theoretical 
model  for  diffusion. 

Another  interesting  idea  to  study  in  the  future  is  the  pinpointing  of  changes  in  behavior 
based  on  topology.  It  is  clear  in  this  paper  that  certain  changes  in  topology  do  have  an  effect  on 
diffusion.  Finding  out  exactly  which  changes  cause  the  change  is  another  area  yet  to  be 
researched.  It  would  be  interesting  and  useful  to  know  exactly  where  the  tipping  points  are  that 
cause  a  diffusion  model  to  exhibit  such  different  results.  Knowing  the  tipping  points  would 
indicate  what  kind  of  diffusion  behavior  could  be  expected  from  the  onset  when  examining  a 
particular  network  configuration. 
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